Vote Embedding:

General idea is from dimensionality reduction - scaling approaches like NOMINATE / wordfish & variants all do some form of dimensionality reduction: Given MEPs, documents, words, votes as features - project points into a lower dimensional space - usually 1 or 2 dimensions.

While alternatives to W-NOMINATE aren't significantly "better" at clustering MEPs using their votes as features - are other methods useful?

Given MEPs and Voting records, there are lots of parameters and variations on what you can do:

  1. Representation: How to represent votes as a matrix
    • Vote x MEP Matrix where each vote is represented as 3 vectors - "Yes", "No", "Abstain"
    • Alternatives: Yes/No only, treating abstensions as missing.
    • (Other ways)
  1. Dimentionality Reduction:
    • None: Matrix of votes
    • NMF: nndsvd initialization -> NMF decomposition
    • word2vec
    • WNOMINATE
  1. Visualisation:
    • t-sne: stochastic neighbourhood embeddings - can be over interpreted (clustering of points more important than x,y positions)
    • WNOMINATE plots

Evaluating quality of approaches is with silhouette scores (measure of cluster quality): -1 for incorrect clustering and +1 for highly dense clustering. Scores around zero mean overlapping clusters.

In [1]:
from IPython.display import display, HTML, Image

7th Term - 2009-2014

W-NOMINATE: (Interpreted as Left / Right & Pro / Anti EU)

In [2]:
display(HTML(open('term7-wnominate-viz.html').read()))
In [3]:
display(HTML(open('term7-wnominate-score.html').read()))
group meps score
2 ALDE 93.0 0.703240
4 ECR 57.0 0.417126
6 EFD 31.0 -0.314953
3 EPP 296.0 0.719467
1 EUL_NGL 44.0 0.532448
7 G_EFA 65.0 0.790769
5 NI 38.0 -0.406864
0 S_D 211.0 0.503859

Using the same voting records, but a different way of reducing dimensions with NMF:

In [4]:
Image(filename='term7-3d-nmf.png')
Out[4]:
In [5]:
display(HTML(open('term7-nmf-score.html').read()))
group meps score
5 ALDE 87.0 0.437510
2 ECR 54.0 0.538264
0 EFD 33.0 -0.348903
7 EPP 273.0 0.766000
3 EUL_NGL 37.0 0.373987
4 G_EFA 57.0 0.350900
6 NI 33.0 -0.380339
1 S_D 193.0 0.240743

Plotting 3d from NMF separately: (can these be interpreted similarly to W-NOMINATE?)

In [6]:
display(HTML(open('term7-count-nmf-xy.html').read()))
In [7]:
display(HTML(open('term7-count-nmf-xz.html').read()))

Using t-SNE dimensionality reduction on the Vote Matrix: t-sne x & y dimensions aren't meaningful in the same way as W-NOMINATE - but similar points (MEPs) should cluster together:

In [8]:
display(HTML(open('term7-count-tsne-plot.html').read()))
In [9]:
display(HTML(open('term7-tsne-score.html').read()))
group meps score
5 ALDE 87.0 0.554916
2 ECR 54.0 0.688848
0 EFD 33.0 -0.279263
7 EPP 273.0 0.508937
3 EUL_NGL 37.0 0.533064
4 G_EFA 57.0 0.611554
6 NI 33.0 -0.318885
1 S_D 193.0 0.458861

Treating Yes / No / Abstain Votes and MEPs as "words" and "contexts" - word2vec can be used:

In [10]:
display(HTML(open('term7-sgns-tsne-sgns-plot.html').read()))
In [11]:
display(HTML(open('term7-tsne-sgns-score.html').read()))
group meps score
5 ALDE 87.0 0.457202
2 ECR 54.0 0.559123
0 EFD 33.0 -0.442806
7 EPP 273.0 0.083764
3 EUL_NGL 37.0 0.566151
4 G_EFA 57.0 0.504731
6 NI 33.0 -0.125213
1 S_D 193.0 0.366300

Going from Votes to Word2Vec to t-sne doesn't cluster MEPs as well as Votes to tsne directly, but does produce an alternative view - but is this useful? (The advantage is that it's fast, but together with tsne is more unstable - different runs will produce similar clusters, but may be arranged differently by tsne)

6th Term - 2004-2009

W-NOMINATE Approach:

In [12]:
display(HTML(open('term6-wnominate-viz.html').read()))

W-NOMINATE Silhouette scores for groups:

In [13]:
display(HTML(open('term6-wnominate-score.html').read()))
group meps score
5 ALDE 126.0 0.683406
2 EPP-ED 340.0 0.301569
1 EUL_NGL 48.0 0.709275
6 G_EFA 44.0 0.714473
7 IND_DEM 27.0 -0.124533
4 NI 39.0 -0.051689
0 PES 264.0 0.625650
3 UEN 51.0 0.160550

Using NMF: (Not as clear as with 7th Term)

In [14]:
Image(filename='term6-3d-nmf.png')
Out[14]:
In [15]:
display(HTML(open('term6-nmf-score.html').read()))
group meps score
6 ALDE 128.0 -0.175533
0 EPP-ED 340.0 -0.022568
3 EUL_NGL 48.0 -0.189733
4 G_EFA 44.0 0.284077
2 IND_DEM 27.0 -0.105155
7 NI 39.0 -0.101654
5 PES 264.0 -0.181478
1 UEN 51.0 -0.048974
In [16]:
display(HTML(open('term6-count-nmf-xy.html').read()))
In [17]:
display(HTML(open('term6-count-nmf-xz.html').read()))

tsne on votes directly:

In [18]:
display(HTML(open('term6-count-tsne-c-plot.html').read()))
In [19]:
display(HTML(open('term6-tsne-c-score.html').read()))
group meps score
6 ALDE 128.0 -0.012771
0 EPP-ED 340.0 -0.044030
3 EUL_NGL 48.0 -0.167438
4 G_EFA 44.0 0.531224
2 IND_DEM 27.0 -0.261050
7 NI 39.0 0.003676
5 PES 264.0 0.024927
1 UEN 51.0 0.069282

Votes to word2vec to tsne:

With this approach - the socialists (S & D) group is split - this could be a visualisation artefact.

In [20]:
display(HTML(open('term6-sgns-tsne-plot.html').read()))
In [21]:
display(HTML(open('term6-tsne-sgns-score.html').read()))
group meps score
6 ALDE 128.0 -0.138645
0 EPP-ED 340.0 0.192428
3 EUL_NGL 48.0 -0.156845
4 G_EFA 44.0 0.539623
2 IND_DEM 27.0 -0.135726
7 NI 39.0 -0.195074
5 PES 264.0 -0.054873
1 UEN 51.0 0.271890
In [ ]: